Advertisement: Support JavaWorld, click here!
April 1999
HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW






ARCHIVE

TOPICAL INDEX
Core Java
Enterprise Java
Micro Java
Applied Java
Java Community

JAVA Q&A INDEX

JAVA TIPS INDEX

JavaWorld Services

Free JavaWorld newsletters

ProductFinder

Education Resources

White Paper Library

NEW! Rational Resources


XML for the absolute beginner

A guided tour from HTML to processing XML with Java


Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend


Page 4 of 10

Advertisement

Make up a markup
While a well-formed document is well-formed because it follows rules defined by the XML spec, a valid document is valid because it matches its document type definition (DTD). The DTD is the grammar for a markup language, defined by the designer of the markup language. For my little XML recipe in Listing 3, for example, that designer would be me. The DTD specifies what elements may exist, what attributes the elements may have, what elements may or must be found inside other elements, and in what order.

Nonvalidating parsers read the XML and, if it's well-formed, give you back the document structure as a tree of objects. We'll discuss the document structure you get from a parser in the section below entitled "The Document Object Model." If the document is well-formed but the elements are nonsensical (as was the case with the two <Qty> elements in the <Ingredient> above), that's your problem.

This is, in fact, how HTML browsers work. Generally, HTML parsers are nonvalidating. The various "HTML checking" parsers, which report sytax errors in HTML, are essentially validating HTML parsers (with additional functionality, like link checking).

Validating parsers read XML, verify that it's well-formed (just as nonvalidating parsers do), and then go on to determine whether the document's element tags are legal, whether the attribute names make sense, whether every element nested inside another element belongs there, and so on.

The DTD defines the document type. It accounts for the Extensible in XML. The DTD is how you actually define a new markup language -- what I often call a dialect of XML. DTDs currently are being written for an enormous number of different problem domains, and each DTD defines a new markup language. New markup languages now exist, or are being designed, to mark up the plays of Shakespeare; to define general data resources (RDF); to model information in the health care industry (HL7 SGML/XML); to typeset, display, and actively use mathematical equations (MathML); and to perform electronic data interchange (XML/EDI). There's even a proposal for a markup language for business data in the footwear industry (FDX). (No, I'm not joking.)

Central to each of these new languages is a DTD that describes what tags the markup language has, what those tags' attributes may be, and how they may be combined. A DTD specifies very clearly what information may or may not be included in a markup language. For instance, the DTD for HTML does not allow for markup tags to select paper size for printing.

Let's take a look at a DTD for the recipe XML in Listing 3. I'm going to call it JWSRML (JavaWorld Scary Recipe Markup Language). Apologies to anyone already using that acronym.


<!-- This is the example DTD for the example XML -->
<!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT Ingredients (Ingredient)*>
<!ELEMENT Ingredient (Qty, Item)>
<!ELEMENT Qty (#PCDATA)>
<!ATTLIST Qty unit CDATA #REQUIRED>
<!ELEMENT Item (#PCDATA)>
<!ATTLIST Item optional CDATA "0"
                  isVegetarian CDATA "true">
<!ELEMENT Instructions (Step)+>

Listing 4. The DTD for JWSRML

The document type definition in Listing 4 defines a language for a validating parser to accept -- meaning, the parser will produce errors if the rules listed in the DTD aren't followed. To get a general idea of how a DTD works, let's look at what a few of the lines in this file mean.

  • <!ELEMENT Recipe (Name, Description?, Ingredients?, Instructions?)>
    The <!ELEMENT...> statement defines a tag in the document. This tag defines a <Recipe> tag, stating that it can contain a <Name>, an optional <Description> (the question mark [?] denotes optionality), an optional<Ingredients> tag, and an optional<Instructions> tag.

  • <!ELEMENT Name (#PCDATA)>
    This simply states that a <Name> tag can contain character data and nothing else.

  • <!ATTLIST Item optional CDATA "0" isVegetarian CDATA "true">
    This section states that the <Item> tag has two possible attributes: optional, whose default value is 0; and isVegetarian, whose default value is true. Notice that attribute values aren't limited to numbers; they can be any text.

A DTD is associated with an XML document by way of a document type declaration, which appears at the top the XML file (after the <?xml...?> line). The document type declaration may contain either an inline copy of the document type definition or contain a reference to that document as a system filename or URI (universal resource ID). For example,


<!DOCTYPE Recipe SYSTEM "example.dtd">

tells the parser to start looking for a <Recipe> tag as the top-level tag of the document. It also declares that the DTD is in the system file example.dtd.

There are other characters and notations in the DTD, but writing DTDs is a topic unto itself. If you're interested in learning more, check out the DTD-related links in Resources.

You now know a lot about how XML is structured and controlled, but you haven't heard what it's good for. Why are people so excited about this technology?


Next page >
Page 1 XML for the absolute beginner
Page 2 HTML: All form and no substance
Page 3 An XML conceptual example
Page 4 Make up a markup
Page 5 So, what good is made-up markup?
Page 6 Cascading Style Sheets: not just for HTML anymore
Page 7 XSL: I like your style
Page 8 Modeling information structure in XML
Page 9 XML and Java
Page 10 Become a tree surgeon!

Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend



Advertisement: Support JavaWorld, click here!


HOME |  FEATURED TUTORIALS |  COLUMNS |  NEWS & REVIEWS |  FORUM |  JW RESOURCES |  ABOUT JW |  FEEDBACK

Copyright © 2003 JavaWorld.com, an IDG company